HIGH-RISK AI ENFORCEMENT DEADLINE · AUGUST 2026 · ALL HIGH-RISK SYSTEMS MUST BE FULLY COMPLIANT
AUDIT
Audit Framework · AI Agent Infrastructure

AI Agent
Infrastructure
Audit

A complete audit framework for AI agent systems — covering infrastructure, autonomous actions, audit trails, data governance, EU AI Act compliance, access controls, and incident response.

7
Audit Domains
8
EU AI Act Obligations
4
Risk Tiers Mapped
AUG
2026
High-Risk Deadline
D1
Infrastructure & Systems
D2
AI Actions & Tool Use
D3
Audit Trail
D4
Data & Privacy
D5
EU AI Act
D6
Access & Identity
D7
Incidents & Failures
D1

AI AGENT INFRASTRUCTURE & SYSTEMS AUDIT

Systems
What makes this different from a standard IT audit: You are not just auditing servers — you are auditing an autonomous software stack that makes decisions and invokes real-world actions. Every component in the chain is a potential attack or failure surface.
AI agent stack — what the auditor must map
Agent infrastructure — top-down component map
[U]
User / Trigger
Human input or automated event
Input
[O]
Orchestrator
LangChain / AutoGen / CrewAI
Prompt
[L]
LLM API
GPT-4 / Claude / Gemini
Tool call
[T]
Tool Layer
APIs, DBs, file system, web
Context
[M]
Memory
Vector DB / context store
Output
[R]
Response
Action or returned answer
Infrastructure audit — test procedure per component
Test 1
Orchestration layer review
Framework version pinned?Dependency SCA scan run?Config stored in version control?No hardcoded prompts in source?
Test 2
LLM API connection security
API keys stored in secrets manager?Model version pinned (no auto-upgrade)?Rate limits and spend caps enforced?DPA signed with LLM provider?
Test 3
Tool integration inventory
Full list of tools agent can invoke?Each tool documented with purpose?Unused tools disabled?Any tool with write/delete access flagged?
Test 4
Memory & vector DB controls
Access control on vector DB?PII not stored in context store?Retention policy on memory defined?Data poisoning prevention in place?
Test 5
Network segmentation
Agent process isolated in own network segment?Egress filtering on outbound tool calls?Agent cannot reach internal admin systems?
Test 6
Supply chain & dependencies
All packages locked in requirements file?SBOM (software bill of materials) exists?Known CVEs scanned and patched?No unpinned pip install in CI/CD?
Outputs → Agent Infrastructure Map Systems Inventory Integration Risk Register Supply Chain Assessment

D2

AI ACTIONS & TOOL USE AUDIT

AI Behaviour
The most novel audit domain. Traditional IT audits test whether humans followed procedures. Here you are testing whether an autonomous agent acted within permitted boundaries — without a human approving every step.
Tool permission matrix — what agents are allowed to invoke
Tool / Action
Read
Write
Delete
External Call
Financial
Web search
Database query
HITL
File system
HITL
Send email / message
HITL
Payment / transaction
HITL
Key: ✓ Permitted autonomously  |  ✗ Blocked by policy  |  HITL = Human-in-the-loop approval required before execution
AI actions audit — test procedure
Test 1
Tool boundary enforcement
Are tool permissions enforced in code, not just policy?Can agent bypass permission by chaining tools?Test: attempt unauthorised action — is it blocked?
Test 2
Human-in-the-loop (HITL) checkpoints
All destructive actions require HITL?HITL cannot be bypassed by prompt instruction?Timeout on pending HITL — agent stops, not proceeds?
Test 3
Prompt injection resistance
Agent tested against indirect prompt injection?Web-retrieved content cannot override system prompt?Input sanitisation on all external content?
Test 4
Multi-agent trust chains
Sub-agents cannot exceed orchestrator permissions?Inter-agent communication authenticated?Rogue sub-agent cannot elevate privilege?
Test 5
Rate limiting & scope caps
Max tool calls per session defined?Max tokens / cost per run capped?Runaway loop detection in place?Spend cap enforced at API gateway level?
Test 6
Output sanitisation
Agent output filtered before downstream use?PII stripped from outputs?Code execution output sandboxed?
Outputs → Tool Permission Matrix Action Risk Register HITL Gap Analysis Injection Test Results

D3

AUDIT TRAIL & OBSERVABILITY

Traceability
The EU AI Act requires automatic logging of high-risk AI operations. Without a complete, tamper-evident audit trail, you cannot investigate incidents, prove compliance, or explain decisions to regulators.
Complete log chain — every hop must be captured
Audit trail — required log events at each stage
[1]
Session Start
User ID, timestamp, trigger source
[2]
Prompt Log
System prompt + user input hash
[3]
LLM Decision
Model, version, token count, response
[4]
Tool Invocation
Tool name, params, result, latency
[5]
HITL Event
Approver ID, decision, timestamp
[6]
Final Output
Action taken or response returned
Audit trail — test procedure
Test 1
Log completeness
Every tool call has a matching log entry?No gaps between session start and end?Replay 5 sessions — do logs reconstruct fully?
Test 2
Tamper-evidence
Logs written to write-once / append-only store?Hash chain or WORM storage enforced?No admin can delete logs without dual approval?
Test 3
Retention compliance
Retention period defined per data type?High-risk AI logs retained minimum required period?Automated deletion policy enforced at expiry?
Test 4
Searchability & incident response
Logs indexed by session ID, user, tool, time?Can reconstruct full decision chain in <30 min?Tested during tabletop incident exercise?
Test 5
PII handling in logs
PII in prompts masked / pseudonymised?Log access restricted to authorised personnel?Raw prompt content not stored in plain text?
Outputs → Observability Gap Report Log Architecture Diagram Retention Compliance Check Incident Replay Test Result

D4

DATA GOVERNANCE & PRIVACY

GDPR · Data
AI agents process personal data differently from traditional systems. Context windows, RAG pipelines, and memory stores create new data flows that standard GDPR assessments miss entirely.
AI-specific data flows — each requires a lawful basis and DPA
Data entering the agent
  • User input (may contain PII)
  • RAG retrieval from knowledge base
  • Tool responses (CRM, DB, email data)
  • Memory recall from prior sessions
  • System prompt (may embed user profile)
Data processed by LLM
  • Full context window sent to external LLM
  • Fine-tuning datasets (if applicable)
  • Embedding generation for vector storage
  • Intermediate reasoning chains
  • Every token sent = a data transfer
Data exiting the agent
  • Agent outputs stored in logs
  • Actions written to downstream systems
  • Memory persisted to vector DB
  • Reports or emails sent externally
  • Embeddings retained indefinitely?
Test 1
LLM provider DPA
Signed DPA with OpenAI / Anthropic / Google?Data processing terms reviewed by legal?Provider certified for EU data processing?
Test 2
PII detection before LLM
PII scanner on all inputs before API call?Names, emails, IDs masked or tokenised?Medical / financial data never sent to LLM?
Test 3
RAG data governance
RAG source documents classified by sensitivity?Access control on vector DB by user role?Deleted docs removed from embeddings too?
Test 4
Right to erasure (GDPR Art. 17)
Can a user's data be fully deleted from logs?Embeddings containing PII can be removed?Test: submit erasure request — verify end-to-end
Test 5
Training data provenance
Fine-tuning data has documented lawful basis?No customer data used in training without consent?Data lineage documented end-to-end?
Outputs → AI Data Flow Map GDPR Gap Report Vendor DPA Register Erasure Test Evidence

D5

EU AI ACT COMPLIANCE ASSESSMENT

Regulatory
First task: classify every AI agent by risk tier. The tier determines everything — obligations, deadlines, and whether third-party conformity assessment is required. Misclassification is itself a compliance failure.
Step 1 — classify each AI agent by risk tier
Banned
Prohibited Feb 2025
Biometric surveillance in publicSocial scoringSubliminal manipulationExploitation of vulnerable groups
Audit action
  • Confirm not deployed
  • Document classification decision
High Risk
Deadline: Aug 2026
CV screening agentsCredit risk agentsMedical decision agentsFraud detection (financial)Critical infrastructure ops
Full compliance suite
  • All 8 obligations apply
  • Conformity assessment required
  • EU AI database registration
Limited
Deadline: Aug 2026
Customer service chatbotsAI writing assistantsContent generation agents
Transparency only
  • Disclose AI interaction
  • Label AI-generated content
Minimal
No deadline
Internal knowledge agentsCode assistants (internal)Summarisation tools
Voluntary best practice
  • Document classification
  • Apply voluntary code
Step 2 — test all 8 EU AI Act obligations for high-risk agents
Obligation
Legal requirement
Auditor tests
1. Risk management system
Documented, iterative risk process covering full AI lifecycle
Risk register exists? Updated at each model change? Reviewed by accountable owner?
2. Data governance
Training data documented, bias-examined, quality-checked
Data lineage documented? Bias metrics computed and within thresholds? Signed off?
3. Technical documentation
Complete technical file covering capabilities, limitations, architecture
Technical file current? Version-controlled? Accessible to regulators within 72hr?
4. Automatic logging
AI system generates automatic logs of operations for traceability
Logs auto-generated? Capture all decisions? Retained for required period? Tamper-evident?
5. Transparency to users
Users informed they are interacting with an AI system
Disclosure present before first interaction? Clear and prominent? Not buried in T&Cs?
6. Human oversight
Human operators able to monitor, intervene, stop the system
Kill switch tested? Override mechanism documented? Oversight role assigned and trained?
7. Accuracy & robustness
Consistent performance, resilience to errors and adversarial inputs
Accuracy metrics documented? Adversarial testing done? Edge case behaviour defined?
8. Conformity assessment
Self-assessment or third-party audit + EU AI database registration
Assessment completed? Registered in EU AI database? CE marking applied if required?
Step 3 — establish provider vs deployer obligations split
PROVIDER — built the AI system
Obligations of the system builder
  • Technical documentation and data governance
  • Conformity assessment before market placement
  • EU AI database registration
  • CE marking on high-risk systems
  • Post-market monitoring and incident reporting
DEPLOYER — uses the AI system in a specific context
Obligations of the user organisation
  • Human oversight measures implemented
  • Input data quality and relevance maintained
  • Users informed of AI interaction
  • Fundamental rights impact assessment (FRIA)
  • Cannot deploy in ways that exceed intended purpose
Outputs → Risk Tier Classification EU AI Act Gap Analysis Conformity File Readiness Provider / Deployer Split

D6

ACCESS, IDENTITY & PRIVILEGE CONTROLS

Access
AI agents are identities. A service account that runs an AI agent must be treated with the same rigour as a privileged human user — potentially more, since it can act autonomously at machine speed.
Test 1
Agent service account review
Every agent has a dedicated service account?Least privilege enforced on each account?No shared credentials across multiple agents?No human account used as agent identity?
Test 2
Secrets management
All API keys stored in vault (not env vars)?Key rotation enforced — max 90 day lifetime?No secrets in source code or config files?Secret scanning in CI/CD pipeline?
Test 3
Deployment pipeline access
Who can deploy or modify agent config?MFA enforced on deployment pipeline?Change approval required before prod deployment?
Test 4
Agent-to-agent authentication
Sub-agents must authenticate to orchestrator?No implicit trust between agent processes?Token-based auth with short expiry?
Test 5
Access reviews
Agent service accounts reviewed quarterly?Unused agents decommissioned and deprovisioned?Access review evidence retained?
Outputs → Agent Access Matrix Secrets Management Review Privilege Escalation Risk Report

D7

INCIDENT RESPONSE & FAILURE MODES

Risk
AI incidents are different from standard IT incidents. Hallucination, prompt injection, runaway loops, and cascading multi-agent failures require dedicated playbooks — and the EU AI Act requires serious incident reporting to regulators.
AI-specific failure modes the auditor must test
Hallucination
Confident incorrect output
  • Detection: output validation layer?
  • Containment: human review before high-stakes action?
  • Is hallucination rate benchmarked and tracked?
Runaway Loop
Agent calls itself recursively
  • Max iteration limit enforced in code?
  • Token / cost cap as secondary kill?
  • Loop detection alert fires within 60s?
Prompt Injection
Malicious instruction via input
  • Indirect injection via retrieved content?
  • System prompt cannot be overridden?
  • Adversarial test suite run regularly?
Cascading Failure
Multi-agent chain collapse
  • Failure in one agent isolated from others?
  • Circuit breaker pattern implemented?
  • Partial failure state handled gracefully?
Model Drift
LLM behaviour changes unexpectedly
  • Model version pinned — no silent upgrades?
  • Regression tests run on model update?
  • Rollback procedure tested and documented?
Unauthorised Action
Agent acts outside permitted scope
  • Kill switch tested and working?
  • Alert on out-of-scope tool invocation?
  • Rollback for side effects of actions?
Incident response — test procedure
Test 1
Kill switch test
Emergency stop tested in staging within last 90 days?Kill switch accessible to non-technical staff?Agent cannot restart itself after kill?
Test 2
AI incident classification
AI incidents have their own severity taxonomy?Hallucination = classified as incident?Runaway loop = P1 incident automatically?
Test 3
EU AI Act serious incident reporting
Serious incident definition per EU AI Act known?Regulator notification process documented?Can notify regulator within required timeframe?
Test 4
Rollback and remediation
Model rollback tested — time to previous version?Side effects of agent actions can be reversed?Post-incident review process documented?
Test 5
Anomaly detection & alerting
Alerting on unusual token consumption?Alert on unexpected tool call patterns?On-call rotation covers AI incidents 24/7?
Outputs → Failure Mode Register AI Incident Playbook Kill Switch Test Evidence Regulatory Notification Procedure

SC

COMPLIANCE READINESS SCORECARD

Executive Summary
Use this scorecard as your audit executive summary. One row per domain — each scored against five compliance dimensions. Present this to the board before the detailed findings report.
Audit Domain
Controls Exist
Designed Well
Operating
Logged
EU AI Act
D1 — Infrastructure & Systems
PASS
GAP
GAP
PASS
GAP
D2 — AI Actions & Tool Use
GAP
FAIL
FAIL
GAP
FAIL
D3 — Audit Trail & Observability
PASS
PASS
GAP
PASS
GAP
D4 — Data Governance & Privacy
GAP
GAP
FAIL
GAP
FAIL
D5 — EU AI Act Assessment
GAP
FAIL
FAIL
FAIL
FAIL
D6 — Access & Identity
PASS
PASS
GAP
PASS
N/A
D7 — Incidents & Failures
GAP
GAP
FAIL
GAP
FAIL
Scorecard key:  PASS  Control exists and operating effectively  |  GAP  Partial — improvement required  |  FAIL  Control absent or not operating — immediate action required
Note: The scorecard above shows a representative example of findings for a typical early-stage AI deployment. Replace each cell with actual test results from your fieldwork. D5 and D4 most commonly produce FAIL ratings in first-time AI Act audits — organisations underestimate how new the GDPR and EU AI Act obligations are for AI-specific data flows.